Q-value Additional Updating Method for Reducing Learning Time
نویسنده
چکیده
In this paper, an update method of Q-value is proposed to increase the learning rate of Q-learning. When Q-value of executed action is small, even if it is an optimal action, the learning becomes longer because the frequency to be executed again becomes lower. The proposed method increased the execution frequency of optimal action by forcefully increasing the Q-value through the Q-value update method. In the test, up to 60% higher goal achievement rate was shown at a maximum.
منابع مشابه
Reinforcement Learning in Continuous Time: Advantage Updating
A new algorithm for reinforcement learning, advantage updating, is described. Advantage updating is a direct learning technique; it does not require a model to be given or learned. It is incremental, requiring only a constant amount of calculation per time step, independent of the number of possible actions, possible outcomes from a given action, or number of states. Analysis and simulation ind...
متن کاملReinforcement Learning Applied to a Differential Game
An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a p...
متن کاملP14: Anxiety Control Using Q-Learning
Anxiety disorders are the most common reasons for referring to specialized clinics. If the response to stress changed, anxiety can be greatly controlled. The most obvious effect of stress occurs on circulatory system especially through sweating. the electrical conductivity of skin or in other words Galvanic Skin Response (GSR) which is dependent on stress level is used; beside this parameter pe...
متن کاملAdvantage Updating Applied to a Differrential Game
An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile a...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کامل